To explain factors affecting the resale prices of public housing in Singapore.
To build hedonic pricing models to explain factors affecting the resale prices of public housing in Singapore. The hedonic price models must be built by using appropriate GWR methods. This study focuses on four-room flat and transaction period from 1st January 2019 to 30th September 2020.
data.gov.sg
packages = c('tidyverse', 'sf', 'spdep', 'tmap', 'ggpubr',
'corrplot', 'units', 'olsrr', 'plyr',
'matrixStats', 'GWmodel', 'httr')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Explanation on the uses of each package:
tidyverse: For data import and tidying. It also consist of several other packages specified below.
sf & spdep: For spatial data handling
tmap: For choropleth mapping
ggpubr: For arranging tmaps to be side-by-side for easy visualisation
corrplot: For multivariate data visualisation and analysis
matrixStats & units: For matrix manipulation
olsrr: For building OLS and performing diagnostics tests
plyr: For easy data manipulation
GWmodel: For calibrating geographical weighted family of models
httr: For API calls
In this section, we have to import and ensure that the geospatial data is in a format that we can use to perform analysis. We have to check if it is being assigned the right CRS and code. Also, since we are working in the boundary of Singapore, we have to ensure that it is in SVY21 with the EPSG code of 3414 being assigned as well.
The datasets that we will prepare in this section are:
mpsz_sf <- st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
mrt_sf <- st_read(dsn = "data/geospatial", layer = "MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source
`C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 185 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
bus_sf <- st_read(dsn = "data/geospatial", layer = "BusStop")
Reading layer `BusStop' from data source
`C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5156 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4427.938 ymin: 26482.1 xmax: 48282.5 ymax: 52983.82
Projected CRS: SVY21
glimpse(mpsz_sf)
Rows: 323
Columns: 16
$ OBJECTID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15~
$ SUBZONE_NO <int> 1, 1, 3, 8, 3, 7, 9, 2, 13, 7, 12, 6, 1, 5, 1, 1,~
$ SUBZONE_N <chr> "MARINA SOUTH", "PEARL'S HILL", "BOAT QUAY", "HEN~
$ SUBZONE_C <chr> "MSSZ01", "OTSZ01", "SRSZ03", "BMSZ08", "BMSZ03",~
$ CA_IND <chr> "Y", "Y", "Y", "N", "N", "N", "N", "Y", "N", "N",~
$ PLN_AREA_N <chr> "MARINA SOUTH", "OUTRAM", "SINGAPORE RIVER", "BUK~
$ PLN_AREA_C <chr> "MS", "OT", "SR", "BM", "BM", "BM", "BM", "SR", "~
$ REGION_N <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGI~
$ REGION_C <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "~
$ INC_CRC <chr> "5ED7EB253F99252E", "8C7149B9EB32EEFC", "C35FEFF0~
$ FMEL_UPD_D <date> 2014-12-05, 2014-12-05, 2014-12-05, 2014-12-05, ~
$ X_ADDR <dbl> 31595.84, 28679.06, 29654.96, 26782.83, 26201.96,~
$ Y_ADDR <dbl> 29220.19, 29782.05, 29974.66, 29933.77, 30005.70,~
$ SHAPE_Leng <dbl> 5267.381, 3506.107, 1740.926, 3313.625, 2825.594,~
$ SHAPE_Area <dbl> 1630379.3, 559816.2, 160807.5, 595428.9, 387429.4~
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((31495.56 30..., MULT~
glimpse(mrt_sf)
Rows: 185
Columns: 4
$ OBJECTID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ~
$ STN_NAME <chr> "EUNOS MRT STATION", "CHINESE GARDEN MRT STATION", ~
$ STN_NO <chr> "EW7", "EW25", "NS14", "NS7", "EW18", "NS5", "EW28"~
$ geometry <POINT [m]> POINT (35782.96 33560.08), POINT (16790.75 36~
glimpse(bus_sf)
Rows: 5,156
Columns: 4
$ BUS_STOP_N <chr> "78221", "63359", "64141", "83139", "55231", "553~
$ BUS_ROOF_N <chr> "B06", "B01", "B13", "B07", "B02", "B03", "B10", ~
$ LOC_DESC <chr> "BLK 231A CP", "HOUGANG SWIM CPLX", "AFT JLN TELA~
$ geometry <POINT [m]> POINT (42227.96 39563.16), POINT (34065.75 ~
Results above show that:
Simple feature collection with 0 features and 15 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
[7] PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
[13] Y_ADDR SHAPE_Leng SHAPE_Area geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 22616.75 ymin: 47793.68 xmax: 22616.75 ymax: 47793.68
Projected CRS: SVY21
BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
264 47201 UNK <NA> POINT (22616.75 47793.68)
Results above show that:
Simple feature collection with 0 features and 15 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
[7] PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
[13] Y_ADDR SHAPE_Leng SHAPE_Area geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
<0 rows> (or 0-length row.names)
Results above show that:
mrt_sf[duplicated(mrt_sf$STN_NAME),]
Simple feature collection with 19 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 22949.03 ymin: 28735.78 xmax: 42362.71 ymax: 46418.56
Projected CRS: SVY21
First 10 features:
OBJECTID STN_NAME STN_NO geometry
33 33 ANG MO KIO MRT STATION NS16 POINT (29807.27 39105.77)
97 105 MACPHERSON MRT STATION DT26 POINT (34235.8 34256.43)
103 111 BUGIS MRT STATION EW12 POINT (30491.56 31424.35)
114 122 EXPO MRT STATION DT35 POINT (42362.71 35285.67)
116 124 BUONA VISTA MRT STATION CC22 POINT (23251.76 32090.77)
121 129 MARINA BAY MRT STATION CE2 POINT (30423.43 28735.78)
124 132 CHINATOWN MRT STATION DT19 POINT (29238.97 29686.54)
134 142 TAMPINES MRT STATION DT32 POINT (40213.03 37463.37)
140 148 SERANGOON MRT STATION NE12 POINT (32480.07 36869.39)
144 152 PAYA LEBAR MRT STATION CC9 POINT (34550.58 33300.34)
bus_sf[duplicated(bus_sf$BUS_STOP_N),]
Simple feature collection with 13 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 13488.02 ymin: 29969.22 xmax: 44144.57 ymax: 47806.75
Projected CRS: SVY21
First 10 features:
BUS_STOP_N BUS_ROOF_N LOC_DESC
350 58031 UNK OPP CANBERRA DR
1569 51071 B21 MACRITCHIE RESERVOIR
2208 82221 B01 Blk 3A
2215 97079 B14 OPP ST. JOHN'S CRES
2392 62251 B03 BEF BLK 471B
2462 22501 B02 BLK 662A
2736 16119 NIL TELETECH PARK
2976 58229 B01A BLK 358A CP
3442 67421 NIL CHENG LIM STN EXIT B
3521 68091 B08 AFT BAKER ST
geometry
350 POINT (27089.69 47570.9)
1569 POINT (28305.37 36036.67)
2208 POINT (35308.74 33335.17)
2215 POINT (44144.57 38980.25)
2392 POINT (35500.36 39943.34)
2462 POINT (13488.02 35537.88)
2736 POINT (22318.89 29969.22)
2976 POINT (26094.86 47806.75)
3442 POINT (34741.77 42004.21)
3521 POINT (32038.84 43298.68)
Results above show that:
mrt_sf <- mrt_sf[!duplicated(mrt_sf$STN_NAME),]
bus_sf <- bus_sf[!duplicated(bus_sf$BUS_STOP_N),]
mrt_sf[duplicated(mrt_sf$STN_NAME),]
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
bus_sf[duplicated(bus_sf$BUS_STOP_N),]
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
<0 rows> (or 0-length row.names)
Results above show that:
st_crs(mpsz_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(mrt_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(bus_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
From the results above:
mpsz_sf <- st_set_crs(mpsz_sf, 3414)
mrt_sf <- st_set_crs(mrt_sf, 3414)
bus_sf <- st_set_crs(bus_sf, 3414)
st_crs(mpsz_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(mrt_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(bus_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
Results above show that:
[1] 9
[1] 0
[1] 0
Results above show that:
[1] 0
[1] 0
[1] 0
Results above show that:
Reveal the distribution of HDB resale units prices in Singapore
Create an interactive point symbol map using tmap_mode(“view”)
tm_dots():
Lastly, tmap_mode(“plot”) to display plot mode
tmap_mode('view')
tm_shape(mpsz_sf) +
tm_polygons() +
tm_shape(bus_sf) +
tm_dots(alpha = 0.4,
col = "blue",
size = 0.03)
tmap_mode('plot')
Results above show that:
remove_busstop <- list(46211, 46219, 46239, 46609, 47701)
bus_sf[bus_sf$BUS_STOP_N %in% remove_busstop, ]
Simple feature collection with 5 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 17969.14 ymin: 49173.42 xmax: 20723.24 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
BUS_STOP_N BUS_ROOF_N LOC_DESC
765 47701 NIL JB SENTRAL
1532 46239 NA LARKIN TER
2257 46609 NA KOTARAYA II TER
2269 46211 NIL JOHOR BAHRU CHECKPT
4346 46219 NIL JOHOR BAHRU CHECKPT
geometry
765 POINT (20370.54 49173.42)
1532 POINT (17969.14 52983.82)
2257 POINT (20025.46 49261.6)
2269 POINT (20623.18 49199.99)
4346 POINT (20723.24 49200.05)
Results above show that:
bus_sf <- bus_sf[!bus_sf$BUS_STOP_N %in% remove_busstop, ]
bus_sf[bus_sf$BUS_STOP_N %in% remove_busstop, ]
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
<0 rows> (or 0-length row.names)
Results above show that:
In this section, we will prepare the aspatial data sets which will be our independent variables for calibrating the hedonic pricing models in the later section.
Note: This section will not be ran to save computational time. Documentation on the *results are still written even though the output is not shown.
The datasets that we will prepare in this section are:
resale <- read_csv("data/aspatial/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
eldercare <- read_csv("data/aspatial/eldercare.csv")
hawker <- read_csv("data/aspatial/hawkercentre.csv")
park <- read_csv("data/aspatial/park.csv")
prisch <- read_csv("data/aspatial/general-information-of-schools.csv")
mall <- read_csv("data/aspatial/mall.csv")
supermarket <- read_csv("data/aspatial/supermarkets.csv")
kindergarten <- read_csv("data/aspatial/kindergarten.csv")
childcare <- read_csv("data/aspatial/childcare.csv")
glimpse(resale)
glimpse(eldercare)
glimpse(hawker)
glimpse(park)
glimpse(prisch)
glimpse(mall)
glimpse(supermarket)
glimpse(kindergarten)
glimpse(childcare)
Results above show that:
resale <- resale %>%
filter(flat_type == "4 ROOM",
month >= "2019-01" & month <= "2020-09")
eldercare <- eldercare %>%
select(NAME, Lng, Lat)
hawker <- hawker %>%
select(NAME, Lng, Lat)
park <- park %>%
select(NAME, Lng, Lat)
prisch <- prisch %>%
filter(mainlevel_code == "PRIMARY") %>%
select(school_name, address, gifted_ind)
supermarket <- supermarket %>%
select(NAME, Lng, Lat)
kindergarten <- kindergarten %>%
select(NAME, Lng, Lat)
childcare <- childcare %>%
select(NAME, Lng, Lat)
glimpse(resale)
glimpse(eldercare)
glimpse(hawker)
glimpse(park)
glimpse(prisch)
glimpse(mall)
glimpse(supermarket)
glimpse(kindergarten)
glimpse(childcare)
# glimpse(bus)
Results above show that:
resale[rowSums(is.na(resale))!=0,]
eldercare[rowSums(is.na(eldercare))!=0,]
hawker[rowSums(is.na(hawker))!=0,]
park[rowSums(is.na(park))!=0,]
prisch[rowSums(is.na(prisch))!=0,]
mall[rowSums(is.na(mall))!=0,]
supermarket[rowSums(is.na(supermarket))!=0,]
kindergarten[rowSums(is.na(kindergarten))!=0,]
childcare[rowSums(is.na(childcare))!=0,]
Results above show that:
resale dataset has a section on its own as there are different preparation steps to be done:
resale$street_name <- gsub("ST\\.", "SAINT", resale$street_name)
resale <- resale %>%
mutate(`address` = paste(`block`, `street_name`, sep=' '))
This function does the following:
library(httr)
geo_code <- function(address) {
url <- "https://developers.onemap.sg/commonapi/search"
query <- list("searchVal" = address, "returnGeom" = "Y", "getAddrDetails" = "N") #, "pageNum" = "1"
res <- GET(url, query = query, verbose())
output <- content(res) %>%
as.data.frame() %>%
select(results.X, results.Y, results.LONGITUDE, results.LATITUDE)
return(output)
}
This code chunk:
resale$X <- 0
resale$Y <- 0
resale$LONGITUDE <- 0
resale$LATITUDE <- 0
for (i in 1:1000) {
output <- geo_code(resale$address[i])
resale$X[i] <- output$results.X
resale$Y[i] <- output$results.Y
resale$LONGITUDE[i] <- output$results.LONGITUDE
resale$LATITUDE[i] <- output$results.LATITUDE
}
for (i in 15001:15901) {
output <- geo_code(resale$address[i])
if (resale$X[i] == 0) {
resale$X[i] <- output$results.X
resale$Y[i] <- output$results.Y
resale$LONGITUDE[i] <- output$results.LONGITUDE
resale$LATITUDE[i] <- output$results.LATITUDE
}
}
write_csv(resale, "data/aspatial/final_resale_prices_XY_LongLat.csv")
resale_new <- read_csv("data/aspatial/final_resale_prices_XY_LongLat.csv")
glimpse(resale_new)
Results above show that:
resale_new <- resale_new %>%
select(month, town, flat_type, storey_range, floor_area_sqm, remaining_lease, resale_price, address, LONGITUDE, LATITUDE)
resale_new <- resale_new %>%
mutate(yesno = 1) %>%
distinct %>%
pivot_wider(names_from = storey_range, values_from = yesno, values_fill = 0)
gsub() of Base R to remove the words years and months in remaining_lease column
substr() and substring() of Base R to place years and months in separate columns
as.double() of Base R to convert to a double data type
is.na() of Base R to locate NA values. Then, replace the NA values with 0.
mutate() of dplyr to create a new column that contains the remaining lease calculated in years.
Lastly, remove unnecessary columns:
resale_new$remaining_lease_new <- gsub("years", "", paste(resale_new$remaining_lease))
resale_new$remaining_lease_new <- gsub("month", "", paste(resale_new$remaining_lease_new))
resale_new$remaining_lease_new <- gsub("s", "", paste(resale_new$remaining_lease_new))
resale_new$remaining_lease_yr <- substr(resale_new$remaining_lease_new, start = 1, stop = 2)
resale_new$remaining_lease_mth <- substring(resale_new$remaining_lease_new, 5, 6)
resale_new$remaining_lease_yr <- as.double(resale_new$remaining_lease_yr)
resale_new$remaining_lease_mth <- as.double(resale_new$remaining_lease_mth)
resale_new$remaining_lease_mth[is.na(resale_new$remaining_lease_mth)] <- 0
resale_new <- resale_new %>%
mutate(`remaining_lease_final` = `remaining_lease_yr` + round((`remaining_lease_mth`/12),2))
drop <- c("remaining_lease_new", "remaining_lease_yr", "remaining_lease_mth")
resale_new <- resale_new[, !names(resale_new) %in% drop]
prisch dataset has a section on its own as there are different preparation steps to be done:
This code chunk:
prisch$LONGITUDE <- 0
prisch$LATITUDE <- 0
count <- 0
failed_count <- 0
for (i in 1:183){
tryCatch( {
output <- geo_code(prisch$address[i])
count <- count + 1
prisch$LONGITUDE[i] <- output$results.LONGITUDE
prisch$LATITUDE[i] <- output$results.LATITUDE
}, error = function(err) {
count <- count + 1
failed_count <- failed_count + 1
cat('Failed to extract',count,'out of',length(prisch$address),'addresses')
}
)
}
write_csv(prisch, "data/aspatial/prisch.csv")
prisch <- read_csv("data/aspatial/prisch.csv")
glimpse(prisch)
Results above show that:
prisch$school_name[prisch$LONGITUDE==0.0000]
Results above show that:
prisch$LONGITUDE[prisch$school_name == "HOUGANG PRIMARY SCHOOL"] <- 103.881072
prisch$LATITUDE[prisch$school_name == "HOUGANG PRIMARY SCHOOL"] <- 1.3783581
prisch$LONGITUDE[prisch$school_name == "JING SHAN PRIMARY SCHOOL"] <- 103.8520152
prisch$LATITUDE[prisch$school_name == "JING SHAN PRIMARY SCHOOL"] <- 1.3722578
prisch$LONGITUDE[prisch$school_name == "WEST GROVE PRIMARY SCHOOL"] <- 103.6989833
prisch$LATITUDE[prisch$school_name == "WEST GROVE PRIMARY SCHOOL"] <- 1.3446809
prisch$LONGITUDE[prisch$school_name == "HWHITE SANDS PRIMARY SCHOOL"] <- 103.9612546
prisch$LATITUDE[prisch$school_name == "WHITE SANDS PRIMARY SCHOOL"] <- 1.365473
gd_prisch <- prisch %>%
filter(gifted_ind == "Yes")
Results above show that:
Note: This section will not be ran except for Section 6.5 and Section 6.6 to save computational time. Documentation on the results are still written even though the output is not shown.
The variables that we will prepare in this section are:
Locational factors
We will then combine the above variables with resale dataset which consists of the prepared Structural factors such as:
This function does the following:
convert_sf <- function(df, x, y) {
st_as_sf(df,
coords = c(x, y),
crs=4326) %>%
st_transform(crs=3414)
}
resale_sf <- convert_sf(resale_new, "LONGITUDE", "LATITUDE")
eldercare_sf <- convert_sf(eldercare, "Lng", "Lat")
hawker_sf <- convert_sf(hawker, "Lng", "Lat")
park_sf <- convert_sf(park, "Lng", "Lat")
gd_prisch_sf <- convert_sf(gd_prisch, "LONGITUDE", "LATITUDE")
mall_sf <- convert_sf(mall, "longitude", "latitude")
supermarket_sf <- convert_sf(supermarket, "Lng", "Lat")
kindergarten_sf <- convert_sf(kindergarten, "Lng", "Lat")
childcare_sf <- convert_sf(childcare, "Lng", "Lat")
prisch_sf <- convert_sf(prisch, "LONGITUDE", "LATITUDE")
longitude <- 103.851784
latitude <- 1.287953
cbd_coorinates_sf <- data.frame(longitude, latitude) %>%
st_as_sf(., coords = c("longitude", "latitude"), crs=4326) %>%
st_transform(crs=3414)
This function does the following:
st_distance() of sf package to calculate distance between 2 sf objects, with a matrix being the output
drop_units() of units package to remove the units
data.frame() of Base R to convert matrix into a dataframe
rowMins() of matrixStats package to get the minimum value of each row i.e. shortest distance of variable to each HDB resale flat.
calulate_prox <- function(df1, df2, var) {
distance_matrix <- st_distance(df1, df2) %>%
drop_units()
df1[,var] <- rowMins(distance_matrix)
return(df1)
}
resale_sf <- calulate_prox(resale_sf, cbd_coorinates_sf, "cbd_prox") %>%
calulate_prox(., eldercare_sf, "eldercare_prox") %>%
calulate_prox(., hawker_sf, "hawker_prox") %>%
calulate_prox(., mrt_sf, "mrt_prox") %>%
calulate_prox(., park_sf, "park_prox") %>%
calulate_prox(., gd_prisch_sf, "gd_prisch_prox") %>%
calulate_prox(., mall_sf, "mall_prox") %>%
calulate_prox(., supermarket_sf, "supermarket_prox")
This function does the following:
calculate_num <- function(df1, df2, distance, var) {
distance_matrix <- st_distance(df1, df2) %>%
drop_units()
distance_matrix <- data.frame(distance_matrix)
df1[,var] <- rowSums(distance_matrix <= distance)
return(df1)
}
resale_sf <- calculate_num(resale_sf, kindergarten_sf, 350, "kindergarten_num") %>%
calculate_num(., childcare_sf, 350, "childcare_num") %>%
calculate_num(., bus_sf, 350, "busstop_num") %>%
calculate_num(., prisch_sf, 1000, "prisch_num")
col_order <- c("month", "town", "flat_type", "remaining_lease", "address",
"resale_price", "floor_area_sqm", "remaining_lease_final",
"01 TO 03", "04 TO 06", "07 TO 09", "10 TO 12", "13 TO 15", "16 TO 18",
"19 TO 21", "22 TO 24", "25 TO 27", "28 TO 30", "31 TO 33", "34 TO 36",
"37 TO 39", "40 TO 42", "43 TO 45", "46 TO 48", "49 TO 51",
'cbd_prox', "eldercare_prox", "hawker_prox", "mrt_prox", "park_prox",
"gd_prisch_prox", "mall_prox", "supermarket_prox",
"kindergarten_num", "childcare_num", "busstop_num", "prisch_num", "geometry")
resale_sf <- resale_sf[, col_order]
glimpse(resale_sf)
st_write(resale_sf, "data/geospatial/final_resale.shp")
resale_sf <- st_read(dsn="data/geospatial", layer="final_resale")
Reading layer `final_resale' from data source
`C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 15853 features and 37 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 11597.31 ymin: 28217.39 xmax: 42623.63 ymax: 48741.06
Projected CRS: SVY21 / Singapore TM
resale_sf <- resale_sf %>%
dplyr::rename(resale_price = rsl_prc, floor_area_sqm = flr_r_s, remaining_lease_final = rmnng__,
lvl_01_TO_03 = X01TO03, lvl_04_TO_06 = X04TO06, lvl_07_TO_09 = X07TO09,
lvl_10_TO_12 = X10TO12, lvl_13_TO_15 = X13TO15, lvl_16_TO_18 = X16TO18,
lvl_19_TO_21 = X19TO21, lvl_22_TO_24 = X22TO24, lvl_25_TO_27 = X25TO27,
lvl_28_TO_30 = X28TO30, lvl_31_TO_33 = X31TO33, lvl_34_TO_36 = X34TO36,
lvl_37_TO_39 = X37TO39, lvl_40_TO_42 = X40TO42, lvl_43_TO_45 = X43TO45,
lvl_46_TO_48 = X46TO48, lvl_49_TO_51 = X49TO51,
cbd_prox = cbd_prx, eldercare_prox = eldrcr_, hawker_prox = hwkr_pr,
mrt_prox = mrt_prx, park_prox = prk_prx, gd_prisch_prox = gd_prs_,
mall_prox = mll_prx, supermarket_prox = sprmrk_, kindergarten_num = kndrgr_,
childcare_num = chldcr_, busstop_num = bsstp_n, prisch_num = prsch_n)
In this section, we will perform some Exploratory Data Analysis to understand our dataset.
mul_hist <- function(df, vnam, x_axis) {
ggplot(data=df, aes(x=vnam)) +
geom_histogram(bins=20, color="black", fill="salmon") +
scale_x_continuous(name = x_axis)
}
mul_hist(resale_sf, resale_sf$resale_price, "Resale Price")

Results above show that:
resale_sf <- resale_sf %>%
mutate(`resale_price_log` = log(resale_price))
mul_hist(resale_sf, resale_sf$resale_price_log, "Resale Price Log")

Results above show that:
ggarrange(mul_hist(resale_sf, resale_sf$floor_area_sqm, "Floor Area Sqm"),
mul_hist(resale_sf, resale_sf$remaining_lease_final, "Remaining Lease"),
mul_hist(resale_sf, resale_sf$cbd_prox, "Proximity to CBD"),
mul_hist(resale_sf, resale_sf$eldercare_prox, "Proximity to Eldercare"),
mul_hist(resale_sf, resale_sf$hawker_prox, "Proximity to Hawker Centre"),
mul_hist(resale_sf, resale_sf$mrt_prox, "Proximity to Mrt"),
mul_hist(resale_sf, resale_sf$park_prox, "Proximity to Park"),
mul_hist(resale_sf, resale_sf$gd_prisch_prox, "Proximity to Good Pri. School"),
mul_hist(resale_sf, resale_sf$mall_prox, "Proximity to Shopping Mall"),
mul_hist(resale_sf, resale_sf$supermarket_prox, "Proximity to Supermarket"),
ncol = 2, nrow = 5)

Results above show that:
Structural Factors:
Locational Factors:
mul_bar <- function(df, vnam, x_axis){
ggplot(data=df, aes(x=factor(vnam))) +
geom_bar(color="black", fill="darkseagreen3") +
scale_x_discrete(name = x_axis)
}
ggarrange(mul_bar(resale_sf, resale_sf$kindergarten_num, "No. of Kindergartens (within 350m)"),
mul_bar(resale_sf, resale_sf$childcare_num, "No. of Childcare Centres (within 350m)"),
mul_bar(resale_sf, resale_sf$busstop_num, "No. of Bus Stops (within 350m)"),
mul_bar(resale_sf, resale_sf$prisch_num, "No. of Pri. School (within 1km)"),
ncol = 2, nrow = 2)

Results above show that:
Locational Factors:
mul_bar_level <- function(df, vnam, x_axis){
ggplot(data=df, aes(x=factor(vnam))) +
geom_bar(color="black", fill="steelblue3") +
scale_x_discrete(name = x_axis) + coord_flip()
}
ggarrange(mul_bar_level(resale_sf, resale_sf$lvl_01_TO_03, "Level 1 to 3"),
mul_bar_level(resale_sf, resale_sf$lvl_04_TO_06, "Level 4 to 6"),
mul_bar_level(resale_sf, resale_sf$lvl_07_TO_09, "Level 7 to 9"),
mul_bar_level(resale_sf, resale_sf$lvl_10_TO_12, "Level 10 to 12"),
mul_bar_level(resale_sf, resale_sf$lvl_13_TO_15, "Level 13 to 15"),
mul_bar_level(resale_sf, resale_sf$lvl_16_TO_18, "Level 16 to 18"),
mul_bar_level(resale_sf, resale_sf$lvl_19_TO_21, "Level 19 to 21"),
mul_bar_level(resale_sf, resale_sf$lvl_22_TO_24, "Level 22 to 24"),
mul_bar_level(resale_sf, resale_sf$lvl_25_TO_27, "Level 25 to 27"),
mul_bar_level(resale_sf, resale_sf$lvl_28_TO_30, "Level 28 to 30"),
mul_bar_level(resale_sf, resale_sf$lvl_31_TO_33, "Level 31 to 33"),
mul_bar_level(resale_sf, resale_sf$lvl_34_TO_36, "Level 34 to 36"),
mul_bar_level(resale_sf, resale_sf$lvl_37_TO_39, "Level 37 to 39"),
mul_bar_level(resale_sf, resale_sf$lvl_40_TO_42, "Level 40 to 42"),
mul_bar_level(resale_sf, resale_sf$lvl_43_TO_45, "Level 43 to 45"),
mul_bar_level(resale_sf, resale_sf$lvl_46_TO_48, "Level 46 to 48"),
mul_bar_level(resale_sf, resale_sf$lvl_49_TO_51, "Level 49 to 51"),
ncol = 2, nrow = 9)

Results above show that:
Structural Factor:
Reveal the geospatial distribution of HDB resale prices (dependent variable) in Singapore
Create an interactive point symbol map by using tmap_mode(“view”)
Use tmap_mode(“plot”) to display plot mode of tmap
All of the above are using tmap package
tmap_options(check.and.fix = TRUE)
tmap_mode("view")
tm_shape(mpsz_sf)+
tm_polygons() +
tm_shape(resale_sf) +
tm_dots(col = "resale_price",
alpha = 0.6,
style="quantile") +
tm_view(set.zoom.limits = c(11,14))
tmap_mode("plot")
Results above show that:
In this section, we will build hedonic pricing models for HDB resale units using lm() of Base R.
Before we start calibrating the Hedonic Pricing Model, we have to ensure that the independent variables used are not highly correlated to each other. This is called multicollinearity in statistics.
resale_sf_nogeom <- resale_sf %>%
st_drop_geometry()
corrplot(cor(resale_sf_nogeom[, 7:37]), diag = FALSE, order = "AOE",
tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")

Results above show that:
resale.mlr <- lm(formula = resale_price ~ floor_area_sqm + remaining_lease_final + lvl_01_TO_03 + lvl_04_TO_06 + lvl_07_TO_09 + lvl_10_TO_12 + lvl_13_TO_15 + lvl_16_TO_18 + lvl_19_TO_21 + lvl_22_TO_24 + lvl_25_TO_27 + lvl_28_TO_30 + lvl_31_TO_33 + lvl_34_TO_36 + lvl_37_TO_39 + lvl_40_TO_42 + lvl_43_TO_45 + lvl_46_TO_48 + lvl_49_TO_51 + cbd_prox + eldercare_prox + hawker_prox + mrt_prox + park_prox + gd_prisch_prox + mall_prox + supermarket_prox + kindergarten_num + childcare_num + busstop_num + prisch_num, data=resale_sf)
summary(resale.mlr)
Call:
lm(formula = resale_price ~ floor_area_sqm + remaining_lease_final +
lvl_01_TO_03 + lvl_04_TO_06 + lvl_07_TO_09 + lvl_10_TO_12 +
lvl_13_TO_15 + lvl_16_TO_18 + lvl_19_TO_21 + lvl_22_TO_24 +
lvl_25_TO_27 + lvl_28_TO_30 + lvl_31_TO_33 + lvl_34_TO_36 +
lvl_37_TO_39 + lvl_40_TO_42 + lvl_43_TO_45 + lvl_46_TO_48 +
lvl_49_TO_51 + cbd_prox + eldercare_prox + hawker_prox +
mrt_prox + park_prox + gd_prisch_prox + mall_prox + supermarket_prox +
kindergarten_num + childcare_num + busstop_num + prisch_num,
data = resale_sf)
Residuals:
Min 1Q Median 3Q Max
-189805 -38452 -1878 34805 453664
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.002e+05 1.333e+04 15.017 < 2e-16 ***
floor_area_sqm 2.798e+03 7.169e+01 39.027 < 2e-16 ***
remaining_lease_final 4.166e+03 4.431e+01 94.010 < 2e-16 ***
lvl_01_TO_03 -7.412e+04 1.042e+04 -7.114 1.17e-12 ***
lvl_04_TO_06 -5.389e+04 1.039e+04 -5.188 2.15e-07 ***
lvl_07_TO_09 -4.024e+04 1.037e+04 -3.879 0.000105 ***
lvl_10_TO_12 -3.264e+04 1.037e+04 -3.147 0.001653 **
lvl_13_TO_15 -2.783e+04 1.039e+04 -2.679 0.007401 **
lvl_16_TO_18 -2.223e+04 1.051e+04 -2.115 0.034459 *
lvl_19_TO_21 -2.551e+02 1.089e+04 -0.023 0.981307
lvl_22_TO_24 -1.769e+03 1.100e+04 -0.161 0.872313
lvl_25_TO_27 3.888e+04 1.165e+04 3.338 0.000846 ***
lvl_28_TO_30 1.086e+05 1.203e+04 9.029 < 2e-16 ***
lvl_31_TO_33 1.260e+05 1.445e+04 8.717 < 2e-16 ***
lvl_34_TO_36 1.187e+05 1.513e+04 7.841 4.74e-15 ***
lvl_37_TO_39 1.797e+05 1.489e+04 12.070 < 2e-16 ***
lvl_40_TO_42 2.154e+05 1.652e+04 13.037 < 2e-16 ***
lvl_43_TO_45 2.540e+05 3.620e+04 7.017 2.37e-12 ***
lvl_46_TO_48 2.720e+05 3.622e+04 7.509 6.28e-14 ***
lvl_49_TO_51 3.385e+05 4.373e+04 7.741 1.05e-14 ***
cbd_prox -1.772e+01 1.959e-01 -90.412 < 2e-16 ***
eldercare_prox -1.580e+01 7.881e-01 -20.055 < 2e-16 ***
hawker_prox -1.965e+01 1.020e+00 -19.265 < 2e-16 ***
mrt_prox -3.220e+01 1.365e+00 -23.590 < 2e-16 ***
park_prox -1.026e+01 1.207e+00 -8.503 < 2e-16 ***
gd_prisch_prox 2.260e+00 2.283e-01 9.901 < 2e-16 ***
mall_prox -2.013e+01 1.472e+00 -13.676 < 2e-16 ***
supermarket_prox -2.376e+01 3.293e+00 -7.216 5.60e-13 ***
kindergarten_num 7.348e+03 5.013e+02 14.657 < 2e-16 ***
childcare_num -4.168e+03 2.778e+02 -15.004 < 2e-16 ***
busstop_num 8.111e+02 1.775e+02 4.570 4.92e-06 ***
prisch_num -8.710e+03 3.893e+02 -22.373 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 59990 on 15821 degrees of freedom
Multiple R-squared: 0.7514, Adjusted R-squared: 0.7509
F-statistic: 1542 on 31 and 15821 DF, p-value: < 2.2e-16
Results above show that:
ols_regress() to print model summary and parameter estimates.
resale.mlr1 <- lm(formula = resale_price ~ floor_area_sqm + remaining_lease_final + lvl_01_TO_03 + lvl_04_TO_06 + lvl_07_TO_09 + lvl_10_TO_12 + lvl_13_TO_15 + lvl_16_TO_18 + lvl_25_TO_27 + lvl_28_TO_30 + lvl_31_TO_33 + lvl_34_TO_36 + lvl_37_TO_39 + lvl_40_TO_42 + lvl_43_TO_45 + lvl_46_TO_48 + lvl_49_TO_51 + cbd_prox + eldercare_prox + hawker_prox + mrt_prox + park_prox + gd_prisch_prox + mall_prox + supermarket_prox + kindergarten_num + childcare_num + busstop_num + prisch_num, data=resale_sf)
ols_regress(resale.mlr1)
Model Summary
----------------------------------------------------------------------
R 0.867 RMSE 59985.065
R-Squared 0.751 Coef. Var 13.835
Adj. R-Squared 0.751 MSE 3598208060.583
Pred R-Squared 0.750 MAE 46091.675
----------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------------------
Regression 1.720769e+14 29 5.933685e+12 1649.067 0.0000
Residual 5.693445e+13 15823 3598208060.583
Total 2.290113e+14 15852
--------------------------------------------------------------------------------
Parameter Estimates
---------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
---------------------------------------------------------------------------------------------------------------
(Intercept) 199337.591 8764.453 22.744 0.000 182158.264 216516.917
floor_area_sqm 2798.007 71.673 0.165 39.039 0.000 2657.520 2938.495
remaining_lease_final 4165.887 44.265 0.445 94.113 0.000 4079.123 4252.651
lvl_01_TO_03 -73296.939 2946.201 -0.232 -24.878 0.000 -79071.827 -67522.050
lvl_04_TO_06 -53071.218 2882.078 -0.186 -18.414 0.000 -58720.418 -47422.017
lvl_07_TO_09 -39416.337 2884.798 -0.133 -13.663 0.000 -45070.869 -33761.806
lvl_10_TO_12 -31816.199 2901.167 -0.103 -10.967 0.000 -37502.817 -26129.580
lvl_13_TO_15 -27011.077 3045.041 -0.067 -8.871 0.000 -32979.704 -21042.450
lvl_16_TO_18 -21411.594 3367.917 -0.039 -6.358 0.000 -28013.096 -14810.093
lvl_25_TO_27 39697.132 6014.804 0.029 6.600 0.000 27907.432 51486.832
lvl_28_TO_30 109436.846 6938.411 0.068 15.773 0.000 95836.770 123036.922
lvl_31_TO_33 126797.840 10363.903 0.050 12.235 0.000 106483.409 147112.271
lvl_34_TO_36 119499.374 11294.187 0.043 10.581 0.000 97361.480 141637.268
lvl_37_TO_39 180558.879 10969.731 0.067 16.460 0.000 159056.956 202060.801
lvl_40_TO_42 216253.102 13089.795 0.067 16.521 0.000 190595.613 241910.591
lvl_43_TO_45 254868.128 34767.850 0.029 7.331 0.000 186719.181 323017.076
lvl_46_TO_48 272834.979 34788.857 0.031 7.843 0.000 204644.856 341025.101
lvl_49_TO_51 339310.275 42543.558 0.032 7.976 0.000 255920.055 422700.495
cbd_prox -17.716 0.196 -0.611 -90.425 0.000 -18.100 -17.332
eldercare_prox -15.804 0.788 -0.087 -20.055 0.000 -17.348 -14.259
hawker_prox -19.647 1.020 -0.084 -19.267 0.000 -21.645 -17.648
mrt_prox -32.200 1.365 -0.104 -23.590 0.000 -34.876 -29.525
park_prox -10.266 1.207 -0.038 -8.504 0.000 -12.632 -7.899
gd_prisch_prox 2.261 0.228 0.058 9.903 0.000 1.813 2.708
mall_prox -20.128 1.472 -0.064 -13.676 0.000 -23.013 -17.243
supermarket_prox -23.756 3.293 -0.031 -7.215 0.000 -30.210 -17.302
kindergarten_num 7348.450 501.243 0.062 14.660 0.000 6365.957 8330.943
childcare_num -4167.025 277.731 -0.068 -15.004 0.000 -4711.410 -3622.640
busstop_num 810.821 177.478 0.020 4.569 0.000 462.944 1158.698
prisch_num -8707.578 389.089 -0.112 -22.379 0.000 -9470.237 -7944.920
---------------------------------------------------------------------------------------------------------------
To perform regression modelling on spatial data, we first have to ensure that the residuals are not correlated with each other.
ols_vif_tol(resale.mlr1)
Variables Tolerance VIF
1 floor_area_sqm 0.8749787 1.142885
2 remaining_lease_final 0.7029250 1.422627
3 lvl_01_TO_03 0.1808740 5.528711
4 lvl_04_TO_06 0.1539728 6.494654
5 lvl_07_TO_09 0.1657612 6.032773
6 lvl_10_TO_12 0.1785631 5.600261
7 lvl_13_TO_15 0.2743421 3.645084
8 lvl_16_TO_18 0.4161241 2.403129
9 lvl_25_TO_27 0.8149313 1.227097
10 lvl_28_TO_30 0.8540849 1.170844
11 lvl_31_TO_33 0.9326623 1.072199
12 lvl_34_TO_36 0.9420581 1.061506
13 lvl_37_TO_39 0.9363147 1.068017
14 lvl_40_TO_42 0.9558749 1.046162
15 lvl_43_TO_45 0.9924109 1.007647
16 lvl_46_TO_48 0.9912128 1.008865
17 lvl_49_TO_51 0.9941306 1.005904
18 cbd_prox 0.3440807 2.906295
19 eldercare_prox 0.8325543 1.201123
20 hawker_prox 0.8210584 1.217940
21 mrt_prox 0.8015666 1.247557
22 park_prox 0.7736125 1.292637
23 gd_prisch_prox 0.4615442 2.166640
24 mall_prox 0.7133221 1.401891
25 supermarket_prox 0.8461031 1.181889
26 kindergarten_num 0.8831753 1.132278
27 childcare_num 0.7568531 1.321260
28 busstop_num 0.8561472 1.168023
29 prisch_num 0.6316363 1.583190
Results above show that:
ols_plot_resid_fit(resale.mlr1)

Results above show that:
ols_plot_resid_hist(resale.mlr1)

Results above show that:
The hedonic model is using geographically referenced attributes, hence it is also important for us to visualise the residual of the hedonic pricing model.
Since we also have to assume that the residuals are not distributed randomly, we need to combine resale_sf with resale.mlr1, then convert it into a SpatialPointsDataFrame to perform spatial autocorrelation test,
mlr.output <- as.data.frame(resale.mlr1$residuals)